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ABSTRACT 

The Asteroseismic Modeling Portal (AMP) provides a web- 
based interface for astronomers to run and view simulations 
that derive the properties of Sun-like stars from observations 
of their pulsation frequencies. In this paper, we describe 
the architecture and implementation of AMP, highlighting 
the lightweight design principles and tools used to produce a 
functional fully-custom web-based science application in less 
than a year. Targeted as a TeraGrid science gateway, AMP's 
architecture and implementation are intended to simplify its 
orchestration of TeraGrid computational resources. AMP's 
web-based interface was developed as a traditional stan- 
dalone database-backed web application using the Python- 
based Django web development framework, allowing us to 
leverage the Django framework's capabilities while cleanly 
separating the user interface development from the grid in- 
terface development. We have found this combination of 
tools flexible and effective for rapid gateway development 
and deployment. 

Categories and Subject Descriptors 

H. 3.5 [Information Storage and Retrieval]: Online In- 
formation Services - Web-based services. 

I. INTRODUCTION 

In March 2009, NASA launched the Kepler satellite as 
part of a mission to identify potentially habitable Earth- 
like planets. Kepler detects planets by observing extrasolar 
transits-brief dips in observed brightness as a planet passes 
between its star and the satellite-that can be used to identify 
the size of the planet relative to the size of the star. How- 
ever, in order to calculate the absolute size of an extrasolar 
planet, the size of the star must also be known. Asteroseis- 
mology can be used to determine the properties of Sun-like 
stars from observations of their pulsation frequencies, yield- 
ing the precise absolute size of a distant star and thus the 
absolute size of any detected extrasolar planets. The As- 
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teroseismic Modeling Portal (AMP, http : //amp . uca r . edu| ) 

presents a web-based interface to the the MPIKAIA astero- 
seismology pipeline [6] to a broad international community 
of researchers, facilitating automated model execution and 
simplifying data sharing among research groups. 

While the MPIKAIA asteroseismology pipeline itself has 
been available to astronomers to download and run on their 
own resources for several years, its potential use for process- 
ing Kepler data provided compelling motivation to explore 
presenting the model as a science gateway. The most sub- 
stantial barriers to an astronomer running the model on a 
local resource are MPIKAIA 's high computational require- 
ments and straightforward but high-maintenance workflow. 
Running a single MPIKAIA simulation requires propagating 
several independent batches of MPI jobs and can consume 
512 processors for over a week of wall-clock time. More 
importantly, the results of these asteroseismology simula- 
tions are of interest to an international community of re- 
searchers. Presenting the model via a science gateway al- 
lows researchers without local resources to run the model, 
disseminates model results to the community without repe- 
tition, and produces a uniform analysis of asteroseismic data 
for many stars of interest. 

The straightforward workflow implemented by AMP also 
provided an opportunity to develop a new science gateway 
while exploring a new architecture, web application frame- 
work, and supporting technologies. One of the first steps 
when designing a science gateway is to select the collec- 
tion of technologies, such as frameworks and toolkits, that 
will be used to construct the gateway. As noted by M. 
Thomas when similarly evaluating frameworks for science 
gateway development, gateways can be constructed using 
tools that vary greatly in complexity and features, with the 
most feature-rich frameworks often introducing substantial 
development complexity [12]. Indeed, many of the prior 
science gateway projects at the National Center for Atmo- 
spheric Research (NCAR) followed the design pattern typi- 
cal of many gateways by using Java to implement complex 
and highly-extensible service oriented architectures and web 
portals. Most notably divergent from our prior work [5], 
AMP does not use an application-specific service-oriented 
architecture and is not written in Java. 

For the design and implementation of AMP, our objective 
was to create a web-based science-driven application that 
peripherally used Grid technologies to enable the back-end 
use of supercomputing resources. We prioritized minimizing 
development time and complexity while retaining full ere- 



ative control of the user interface by selecting the Django 
rapid- development web framework and implementing the 
Grid functionality with command- line toolkit interfaces. 

Due to AMP's computational requirements, AMP has been 
designed since its inception to target TeraGrid resources. 
Many of the best practices and procedures for developing 
and deploying science gateways on the TeraGrid were pro- 
posed coincident with our initial exploration of targeting 
TeraGrid as AMP's computational platform. As such, AMP 
also provides an example of constructing a new science gate- 
way specifically for TeraGrid cyberinfrastructure rather than 
the common case of extending an existing gateway to utilize 
TeraGrid. AMP's architecture separates the web-based user 
interface and the workflow system performing Grid opera- 
tions, isolating interactive users both logically and physically 
from TeraGrid operations. We utilized only components 
common to all TeraGrid resource providers with the goal of 
facilitating easy deployment on current TeraGrid- managed 
resources without any resource provider assistance. 

The remainder of this paper is organized as follows. Sec- 
tion [2] describes the asteroseismology model workflow and 
computational requirements. Section [3] and U describe the 
architecture, design, and implementation of AMP. Section [5] 
discusses our experiences with AMP's implementation em- 
phasizing the potential usefulness of the design principles 
for future gateway projects, and the paper concludes with 
continuing and future work. 

2. BACKGROUND 

The asteroseismology workflow provided by AMP consists 
of two components: a forward stellar model and a genetic 
algorithm (GA) that invokes the forward model as a sub- 
routine. The forward stellar model is the Aarhus Stellar 
Evolution Code (ASTEC) i4j, a single-processor code that 
takes as input five floating-point physical parameters (mass, 
metallicity, helium mass fraction, and convective efficiency) 
and constructs a model of the star's evolution through a 
specified age. The output of the model includes observable 
data such as the star's temperature, luminosity, and pulsa- 
tion frequencies. In addition to the scalar parameter output, 
ASTEC produces data that can be used to produce basic 
graphical plots describing the star's characteristics, includ- 
ing a Hertzsprung-Russell diagram showing the star's tem- 
perature and luminosity and an Echelle plot summarizing 
the star's oscillation frequencies. 

In practice, however, the reverse problem must be solved: 
ASTEC models a star with known properties and produces 
its observable characteristics, while the real research prod- 
uct requires starting with observations and identifying the 
properties of a star that could produce those observations. 
In order to derive the properties of distant stars from ob- 
servations, ASTEC is coupled with the MPIKAIA parallel 
GA [6] to create an automated stellar processing pipeline [7j . 
The GA creates a population of candidate stars with a vari- 
ety of physical parameters, models each star using ASTEC, 
and then evaluates each candidate star for similarity to the 
observed data. Over many iterations, the GA converges to 
identify an optimal candidate star that has the properties 
most likely to produce the observed data. The candidate 
star is then subjected to a solution detail run that further 
refines the star's characteristics at a finer granularity and 
produces the final model output. 



AMP supports both modes of execution from its web- 
based user interface: running the forward model with spe- 
cific model parameters (a "direct model run"), and executing 
the GA to identify model parameters that produce observed 
data (an "optimization run"). Direct model runs are trivial 
to configure and execute: they require five fioating-point pa- 
rameters as input, take 10-15 minutes to execute on a single 
processor, and produce a few kilobytes of output. Opti- 
mization runs are both more complex and computationally 
intensive. 

The optimization run workfiow consists of an ensemble of 
independent GA runs, with each run requiring the execu- 
tion of multiple sequential tasks (see Figure [1]) . For each 
optimization run, multiple separate GAs are executed and 
allowed to converge independently. Each GA (and indeed 
each task) is started with randomly generated seed parame- 
ters to encourage the GA to explore a wide parameter space, 
avoid local minima, and provide confidence in the optimality 
of the final result. The GAs can take from hours to days to 
converge depending on system performance and the number 
of iterations requested, so a GA may not converge in a single 
task execution within the target supercomputer's walltime 
limitations. Thus, each GA run may require several invo- 
cations of the executable to converge to a solution. When 
all of the GA runs in the ensemble are complete, the best 
solution is evaluated using the forward model to produce 
detailed output for presentation and analysis. 

In the current configuration for the Kepler data analy- 
sis, each optimization run consists of four GA runs executed 
in parallel, and each GA models a population of 126 stars 
(using 128 processors) for 200 iterations. One interesting 
artifact of the ASTEC model is that the execution time 
varies slightly depending on the target star's characteristics. 
During the first few iterations, some stars in the randomly 
chosen population may take more time to model than oth- 
ers. Because the iteration is blocked on the completion of 
all stars in the population, the iteration run time is set by 
the longest-running component star. However, as the model 
continues and the population begins to converge, the model 
run time for each star also converges and the time to run 
each iteration decreases. Thus, the 200 iterations can be 
performed in about 160x to 180x of the first iteration's mea- 
sured time. 

As part of the allocation request for TeraGrid resources, 
the stellar model was benchmarked on four TeraGrid plat- 
forms (see Table [1]). From the astronomer's perspective, the 
most important metric is the predicted optimization run 
(GA) run time. The modern Intel and AMD processors 
in the NICS and TACC resources can propagate the GA 
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Figure 1: AMP asteroseismology workflow. 



System 


Stellar Model 


Optimization Run (Genetic Algorithm) 




Run Time (h) 


CPUh 


SUs/CPUh 


TeraGrid SUs 


NCAR Frost 


110.0 


293.3 


150,187 


0.558 


83,804 


NICS Kraken 


23.6 


61.9 


31,723 


1.623 


51,486 


TACC Lonestar 


15.1 


40.4 


20,670 


1.935 


39,996 


TACC Ranger 


21.1 


56.2 


28,771 


1.644 


47,229 



Table 1: Measured stellar benchmark run time, and estimated optimization run time and SU charge, for 
selected TeraGrid systems. An optimization run performs 200 GA iterations and requires about 160x the 
model benchmark time to complete, and each GA executes four 128-processor jobs. 



to completion in about 40-60 hours, while the slower pro- 
cessors in NCAR's Frost system can require over 12 days. 
When considering TeraGrid's service unit (SU) charging fac- 
tors and the model performance, the TACC systems are 
most efficient platforms for this model, but the systems are 
generally similar in cumulative charging. For our produc- 
tion deployment, we have targeted the NICS Kraken system 
due to its short solution time and support for WS-GRAM. 
The TACC systems demonstrated better performance, but 
the small disk space available on Lonestar and lack of WS- 
GRAM on Ranger, combined with the current allocation 
oversubscription on those systems, discouraged their use for 
this project. For additional computational volume, we con- 
tinue to utilize NCAR's Frost system. 

3. ARCHITECTURE 

The high-level AMP architecture reflects our principal de- 
sign goals of supporting rapid development and explicitly 
targeting TeraGrid computational resources. The architec- 
ture consists of three main components: the web-based user 
interface, the "GridAMP" workflow daemon that functions 
as a grid client, and the remote computational resources run- 
ning the model (see Figure [2|). The separation of these three 
main components is fundamental to the architecture. 

With respect to supporting rapid development, one ad- 
vantage of the separation of AMP's functional components 
is its ability to support specialized labor. This approach 
generally decouples the tasks of web development, back-end 
Grid software engineering, and the debugging and mainte- 
nance of the science software itself. This is particularly ben- 
eficial because it is much easier to find students to work on 
web-related development (e.g., undergraduates) than to find 
students that possess a thorough understanding of the intri- 
cacies of Grid infrastructure and middleware (e.g., graduate 
students with several years of experience), to say nothing 
of trying to find students that can work proficiently (and 
efficiently) with both. Because the interface and Grid com- 
ponents are not tightly coupled, they can be easily developed 
and maintained by individuals with complimentary skill sets. 
We have continued the separation concept through to the 
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Figure 2: AMP high-level architecture. 



science code itself by running the code in an environment 
identical to that used by the astronomy principal investiga- 
tor and colleagues. Rather than dispatching software engi- 
neers or students to maintain the application, the science PI 
occasionally updates the Grid-executed code using sudo on 
the remote resource personally. 

Separating the user interface from the grid-related pro- 
cessing components also simplifies the administrative respon- 
sibilites associated with using TeraGrid computational re- 
sources. In particular, one concern often associated with 
science gateways is their use of a shared credential to submit 
jobs on behalf of a community of individual gateway users 
[TT]. Gateways that utilize TeraGrid resources are required 
to maintain user registries and associate every Grid request 
with a specific gateway user. In order to provide end-to-end 
user accounting for all gateway jobs and to allow resource 
providers to disambiguate the real users acting behind com- 
munity credentials, TeraGrid has developed and deployed 
the GridShib SAML extensions [8]. However, an underly- 
ing risk remains: a science gateway typically runs a publicly 
accessible web server and also must possess the credentials 
necessary to access many machines on the TeraGrid. 

The AMP architecture addresses this conern by separat- 
ing users from the community account credential by placing 
them on distinct servers. The user interacts with a web por- 
tal located on one publicly-accessible server, while all back- 
end processing and remote Grid operations are performed 
by the GridAMP daemon on another server. All communi- 
cation between the AMP portal and the GridAMP daemon 
are asynchronously performed by manipulating a database 
located on yet another server. Moreover, the roles and priv- 
ileges of the public web portal and GridAMP daemon are 
strictly managed and controlled. The public web portal is 
essentially a database-driven web server without any Grid 
connectivity or Grid software. The server hosting the Grid- 
AMP daemon is accessible only to the developers using SSH 
keys, and only GridFTP is externally exposed to facilitate 
data staging via the community account credential. All in- 
put data from users is marshaled through the SQL database. 
Incoming user data is parsed by the web server and uploaded 
to database tables with strict data type constraints. When 
required, the input files are regenerated from the database 
by the GridAMP daemon and then staged to TeraGrid sys- 
tems. It is thus exceptionally difficult to send any data other 
than a properly formatted asteroseismology input file to a 
TeraGrid resource, and even a full root compromise of the 
web server does not provide access to any credentials used 
for access to any other system. This architectural feature 
helps AMP comply at the most fundamental level with the 
TeraGrid science gateway security best practices ^IQj . 



4. IMPLEMENTATION 

The AMP gateway and the GridAMP daemon are im- 
plemented in Python 2.4/2.6 using the Django web devel- 
opment framework [2 . Django's primary intended use is 
as a web development platform, but over two software en- 
gineering iterations, we adopted Django as the underlying 
framework for both the AMP website and the GridAMP 
daemon. We were able to perform two complete cycles of 
a "spiral-model" software engineering process in about one 
year, completely re-implementing the entire website and pro- 
cessing daemon about 6 months after the initial prototyping 
commenced. 

In our first development prototyping cycle, we perhaps 
took the separation of components concept too far, as we 
used Django to implement the website but implemented the 
GridAMP daemon in Python using manually-coded SQL 
database calls. This made sense at the time: although 
Django provides a full-featured object-relational model (ORM) 
independent of its web server-related features, we were skep- 
tical that the ORM would be sufficiently robust to fulfill our 
requirements. For example, we demand direct and explicit 
control of the database schema and wanted to use database 
permissions to carefully control access to database tables on 
a per-user basis. Even the idea of allowing a ORM system to 
create tables based on Python object definitions seemed ir- 
reconcilable with production-quality science gateway imple- 
mentation. Over the first six months of development, how- 
ever, it became clear that this was not the case - the Django 
ORM was more powerful and fiexible than we imagined 
could be possible. We were able to easily redefine our prior 
manually-specified database schema entirely using Django 
with perfect table/field/type correspondence, including our 
desired permissions scheme, all from within Django's ORM. 
Moreover, the database schema could be reconstructed on 
demand-including sample data-in test databases when re- 
quired for development work. The ORM also worked from 
standalone programs outside of Django's web serving infras- 
tructure. 

Thus, the usefulness of the Django "don't repeat yourself" 
philosophy quickly became apparent and immediately ap- 
plicable to AMP. While the service separation philosophy 
can be taken to an extreme - we could have even switched 
languages between the web server and the GridAMP dae- 
mon - maintaining two separate codebases quickly became 
a mundane waste of time. We therefore maintained the op- 
erational separation of the web site and GridAMP daemon 
but unified the framework for both components. The entire 
project now uses a single code base to define and manipulate 
shared data structures across multiple servers. 

4.1 Common Components 

Software written with the Django framework is organized 
into "projects" and "applications". A project basically rep- 
resents a website and consists of a common configuration 
and a collection of installed possibly independent applica- 
tions. Applications are written using the typical model- 
view- controller design pattern, better described as model- 
template-view using Django's terminology. Models use the 
ORM to abstract database access behind Python objects 
while providing the opportunity to add custom functionality. 
When a HTTP request is received, the request is dispatched 
to the appropriate Python subroutine (a "view") to perform 
necessary processing. View routines then usually conclude 



by rendering final output to the user via Django's template 
engine. 

For AMP, we implemented most of the science gateway 
functionality in a single core application consisting of ORM 
models and support routines. For example, the catalog of 
stars, their identifiers, the simulations, and the constituent 
supercomputer jobs are all stored in this core application. 
This effectively makes the most important components of 
AMP first-class global objects when imported properly. The 
web interface is then constructed of additional applications 
that refer to the core application as required. Only this core 
application's models are shared between the website and the 
GridAMP daemon. 

For both the web server and the GridAMP daemon, we 
also adopted Django's built-in authentication "auth" frame- 
work. The authentication framework provides basic web- 
site user management functionality including common user- 
initiated account manipulation activities. We extended the 
Django authentication framework to support additional in- 
formation required by AMP and TeraGrid, such as data 
provenance and user authentication metadata. 

An additional benefit of using the Django ORM and au- 
thentiation framework is that Django's built-in development 
server provides an administrative interface that can manip- 
ulate ORM objects including those created by the authen- 
tication framework. The interface is also easily modified to 
support custom requirements. Thus, administrative tasks 
such as approving users or adjusting back-end parameters 
(like allocations and the authorization for a user to submit 
to a machine using a particular allocation) can easily be 
manipulated from a graphical interface without custom de- 
velopment. The interface is available to developers running 
the Django development server with appropriate database 
connectivity, so the administrative functionality is not even 
possible from any publicly accessible web servers. 

4.2 User Interface 

In addition to the shared Django application that contains 
the core AMP models, we wrote separate Django applica- 
tions to implement independent portions of the website func- 
tionality. One application allows users to browse and search 
star catalogs, one allows users to view completed simula- 
tion results, and another facilitates simulation submission. 
These applications don't contain models so they are useful 
only within the context of a Django project containing the 
core AMP application, but the distinction provided a logical 
separation of site components. 

We also wrote additional standalone Django applications 
containing potentially reusable code. For example, we wished 
to use a CAPTCHA to reduce the possibility of automated 
bots requesting AMP accounts. Due to our accessibility 
requirements, using a typical image-only CAPTCHA was 
problematic, so we decided to write our own. Our gen- 
eral purpose question/answer CAPTCHA presents a series 
of questions with optional links to answers. For AMP, users 
are asked to enter the HD catalog numbers of popular stars, 
such as "What is the HD number for Alpha Centauri?" For 
astronomers that can't remember, we present a link to the 
page containing the answer. With this, only one real estate 
agent turned fashion supermodel has requested the ability 
to submit AMP jobs. 

AMP's web interface is quite typical for current database- 
driven websites in that it combines static and dynamic web 



technologies to provide its user experience. AMP uses AJAX- 
based "Web 2.0" techniques to simphfy the user experience 
where possible, but the site is fully functional without these 
JavaScript enhancements. For example, the process of search- 
ing for a star uses AJAX to suggest stars with results or in 
the Kepler catalog. If no stars are in AMP's catalog, the 
search is passed to the SIMBAD p] astronomical database 
and the target, if found, is added to the local catalog. Fi- 
nally, AMP uses Django's SSL authentication and session 
management support to ensure that all activities performed 
by registered users is encrypted. 

4.3 Grid Execution 

To simplify the deployment of the AMP model on Tera- 
Grid systems, we constructed a workflow that utilizes only 
basic components provided by the Coordinated TeraGrid 
Software and Services (CTSS) software stack Rather 
than deploying a SO A with services that encapsulate the 
models as we have done in the past for other projects, the 
GridAMP daemon directly formulates and submits GRAM 
execution requests and GridFTP file transfers. Thus, the 
model can be deployed on a TeraGrid resource as soon as 
the community account has been authorized and no special 
resource provider dispensations (e.g., custom Globus con- 
tainers or separate service hosting platforms) are required. 

The remote resource execution environment for each AMP 
job is initialized and finalized using shell scripts invoked by 
GRAM using the fork job service. The pre-job stage creates 
a new empty copy of the model runtime directory structure 
and prepopulates the tree with static input files. The model 
is then run using GRAM through the scheduler interface 
with each model invocation staging in the small input data 
text file and staging out its restart progress file. The post-job 
stage uses tar to consolidate output and log files into a single 
file for transfer back to the GridAMP daemon and eventual 
delivery to the user via the website. A final cleanup stage 
ensures that the execution environment has been removed. 

4.4 GridAMP Workflow Daemon 

The GridAMP daemon manages the workfiow of AMP 
simulations on remote grid resources. It reads simulation 
information from the centralized database, performs the nec- 
essary grid client actions, and updates the database accord- 
ingly. The AMP website and the GridAMP daemon thus 
interact asynchronously through the centralized database. 

We wrote a custom Python module to handle the grid 
client functionality via calls to the Globus command-line in- 
terfaces. The module supports generating derivative proxy 
certificates with GridSHIB SAML extensions, GridFTP, and 
GRAM. The primary reasons for using our own library were 
that we already had such functionality in-house and our fa- 



miliarity with our grid support module made it seem simpler 
and more robust than using third-party solutions. The most 
important operational benefit for wrapping command line 
clients is that it provides excellent support for troubleshoot- 
ing. The daemon produces logs that clearly highlight warn- 
ings and errors with the relevant command lines displayed 
for failure cases. To troubleshoot, a developer needs only to 
open a new console on the GridAMP server and copy-paste 
the line at the shell prompt to retry the failed action. The 
Grid operations are not hidden behind complex object mod- 
els but are transparent so that problems can be investigated 
and corrected quickly and easily. 

Due to AMP's straightforward processing requirements, 
we also wrote our own workfiow management daemon. The 
workfiow is represented as a list of stages with function 
pointers that must return to proceed to the next state (see 
Listing [1]). If the job is in a particular state, all of the func- 
tions in the subsequent list are called. If all return True, then 
the job is set to the indicated next state. In practice, the 
first function usually checks to see if the prior state has com- 
pleted, and the last function propagates the job to the next 
state. This simple encoding can represent arbitrary trees of 
execution, but for AMP the processing is merely linear. The 
only coding cleverness is the use of inheritance to support 
AMP's two job types with a single base class implement- 
ing all of the routine functionality. Job queuing, stage-in, 
and stage-out are all handled by the base class. Only the 
functions that generate the GRAM job definitions and per- 
form model postprocessing are implemented in the derived 
classes. Thus, the derived classes are very small and contain 
only model-specific execution and postprocessing code. 

Workfiow state management and job status tracking are 
integrated with AMP's data model as implemented using 
the Django ORM and stored in the centralized database. 
We utilized a two-level approach to workfiow status manage- 
ment, integrating the simulation status in the application- 
specific data models while maintaining constituent grid job 
status in a more generic fashion. To manage the workfiow, 
the daemon first polls the status of each grid job and up- 
dates the job records accordingly. This process is identical 
for all grid jobs regardless of purpose (pre-job, post-job, or 
simulation) or execution method (fork or queue), and no 
special callbacks or processing are performed as part of the 
grid job status update procedure. Once the grid job status 
has been updated, the workfiow management code simply 
retrieves the last-known status of the appropriate job and 
waits or proceeds accordingly. One advantage to this ap- 
proach is that simulation status is integrated at the highest 
level of the application-specific data model so the user inter- 
face does not need to analyze the state of many individual 
grid jobs to determine the current state of a simulation. 



Listing 1: Example GridAMP workflow definition 



self, workflow 


- { 










'QUEUED' : 


( 


self 


check. 


_queued_sim , s e 1 f . submit_prej ob ] , 


'PREJOB') , 


'PRE JOB' : 


( 


self 


check 


_prejob , s e 1 f . submit_workjob ] , 


'RUNNING' ) , 


'RUNNING' 


( 


self 


check. 


.workjob , self . submit_postj ob ] , 


'POSTJOB') , 


'POSTJOB' 


( 


self 


check 


-postjob , s e 1 f . post process , s e 1 f . submit_cleanup ] 


'CLEANUP' ) , 


'CLEANUP' 


( 


self 


check 


.cleanup, self . close_simulation ] , 


'DONE') 



} 



As part of the workflow management process, the Grid- 
AMP daemon also handles failures and provides user status 
notifications. Our error management philosophy completely 
isolates gateway users from the jargon of grid-related fail- 
ures and transients. Users are not notified of events that 
they may not understand and are definitely not capable of 
correcting. Unless the asteroseismology model fails, the sim- 
ulation will be completed and returned to the user. Users 
may opt to receive an e-mail when their simulation com- 
pletes or to receive e-mails at each state transition. 

The GridAMP daemon distinguishes between anticipated 
transients, model processing failures, and its own failures. 
Anticipated transients, such as remote systems suddenly be- 
coming unreachable for GRAM or GridFTP requests, are 
handled silently: administrators are notified, the job's status 
display is supplemented with a plain-text message describ- 
ing the situation, and the processing is retried automatically 
without user or administrator intervention. Model failures, 
such as the absence of a mandatory output file or the failure 
of a result line to parse correctly, generally require gateway 
administrator intervention and occasionally escalate to the 
science investigators for model development work. In the 
event of a model failure, the simulation is moved to a spe- 
cial "hold" state and both the user and administrator are 
notified. The gateway administrators can then debug the 
problem and retry the failed processing steps interactively. 
Once the problem has been resolved, the workfiow resumes 
automatically. Finally, failures of the GridAMP daemon it- 
self are monitored externally and immediately brought to 
the attention of the gateway administrators. 

5. DISCUSSION 

Perhaps the most fundamental characteristic of AMP is 
its posture as a grid- enabled science gateway. When con- 
sidering our earlier grid gateway projects and a small set 
of existing grid gateway frameworks, we realized that we 
did not really want to build a "grid gateway" in the sense 
suggested by these projects and frameworks. Rather, we 
wanted a science-driven web-based application focused on 
delivering the required functionality to our user community 
that happened to use grid resources and technology to per- 
form some of its computationally intensive processing. To 
that end, AMP completely hides many aspects of its grid 
nature from users. As most astronomers are familiar with 
high-performance computing, concepts such as simulations, 
computational jobs, allocations, and supercomputers remain 
visible terminology, but the word "certificate" is not even 
mentioned anywhere on the site. 

Our ability to decouple AMP's front-end and back-end 
components was enabled by AMP's straightforward work- 
fiow and lengthy job turnaround time. We recognize that 
the luxury of asynchronous coupling is not afforded by many 
science gateways that facilitate interactive analysis and visu- 
alizations. The decoupled asynchronous processing is appro- 
priate for AMP's jobs, simplified the implementation, and 
facilitates operational debugging. 

While workfiow management is well understood and a va- 
riety of robust technologies are available to automate work- 
fiows 1 , it was indeed quite simple to implement a small- 
scale custom workfiow manager for AMP. In fact, if GRAM 
ever supports executing pre-job and post-job scripts using 
the fork service as part of a queued job specification, half 
of AMP's functionality could be implemented using a single 



GRAM job submission! For the optimization runs, the most 
complex portion of the workfiow is downloading and inter- 
preting partial result files, which requires custom implemen- 
tation regardless of the workfiow management paradigm. By 
writing our own simple workfiow management daemon, we 
have retained a single application-defined representation of 
all state. The Django models used by the website are used 
for execution management by the GridAMP daemon. This 
avoids the need to deploy and query middleware to run grid 
jobs and provides the transparent end-to-end debugging ca- 
pability that is useful when things go wrong. 

We are particularly impressed with the Python-based Django 
web development framework. For our purposes, Django 
seemed to perfectly balance framework features and cus- 
tomization, supporting the rapid development web sites with- 
out being a content management system. The programming 
methodology was intuitive, suggesting but not enforcing a 
model-view-controller design pattern. The Django frame- 
work was useful even for the non-web portions of the project. 
The self-contained development environment was easy to in- 
stall and facilitated quick prototyping and debugging. When 
combined with the Apache web server, the framework was 
robust enough to function as a production system. 

Our use of A J AX and Web 2.0 technologies has been lim- 
ited to cases where it is clearly beneficial to our user com- 
munity. For example, the star search functionality suggests 
stars that are in the Kepler catalog and stars that have re- 
sults as soon as a user types enough of a catalog identi- 
fier to disambiguate possible targets. Given the long job 
turnaround time, however, opportunities to make the web- 
site appear more dynamic are limited. We could do many 
cool tricks with AJAX and social networking, and it was 
very tempting to allow astronomers to "share a star" via 
Facebook or send simulation progress updates using Twit- 
ter. More pragmatically, we are currently working on using 
RSS feeds to allow astronomers to subscribe to stars of inter- 
est and adding dynamic links to astronomical catalogs and 
visualization services such as SIMBAD and Google Sky. 

Although AMP was designed as a custom solution for a 
specific model and workfiow, we believe that some AMP 
components may be a useful foundation for future similar 
grid gateway development. Of course, the AMP user in- 
terface is completely custom, but Django facilitates rapid 
web development in its own right. The core AMP mod- 
els that represent jobs and the base classes of the workfiow 
manager are potentially generic enough to support other ap- 
plications and workfiows with minimal changes. Although 
we have not done so, it would not be particularly difficult 
to isolate the common job management functionality from 
the models such that it could be added to new models as de- 
sired. The GridAMP daemon already supports this abstrac- 
tion, as the workfiow manager base class itself contains only 
grid code and all appUcation-specific logic is contained in 
the workfiow-specific derived classes. This level of abstrac- 
tion would have to be similarly introduced to the data mod- 
els by using complementary table schemas or inheritance to 
make a model represent grid jobs using a mechanism other 
than copying and pasting certain fields into the model defi- 
nition. In this more generic approach, models would be de- 
fined only with application-specific job fields (such as input 
and results) with the job management fields provided exter- 
nally. Thus, while AMP and its underlying components are 
clearly not a framework from which new gateways may eas- 



ily be constructed, AMP demonstrates how rapid web de- 
velopment frameworks combined with simple grid support 
libraries can be used to produce useful science gateways. 

6. FUTURE WORK 

Although AMP is currently being used for friendly user 
testing and we do not anticipate making any fundamental 
changes over the next year or two, we have identified several 
front-end and back-end features that we wish to explore in 
the future. Again, we are currently investigating the best 
way to provide simulation progress and star result updates 
via RSS and refining our use of AJAX techniques to enhance 
the user experience in subtle yet meaningful ways. As the 
number of simulations on AMP grows, we anticipate that 
we will need to revisit the interface used to organize and 
present the results of the simulations. 

One limitation of GridAMP that we intend to examine in 
the near future is its use of multiple sequential GRAM jobs 
to propagate optimization runs to completion. Although 
each GRAM job is set to the target system's walltime (usu- 
ally 6 or 24 hours), continuation jobs are only submitted 
once the prior job has finished. Thus, the continuation jobs 
must wait in the remote system's batch queue before pro- 
cessing can resume. Many schedulers in use at TeraGrid 
sites support job chaining (or job dependencies) such that 
multiple jobs can be submitted at once and queued inde- 
pendently but declared elegible to run only after a prior job 
has completed. This would be perfect for AMP jobs, as the 
initial simulation submission could include the 4-8 jobs that 
are always required to perform the simulation, possibly re- 
ducing the cumulative queue wait time. We are currently 
making a graphical tool that plots job wait vs. execution 
time on a Gantt chart for each AMP simulation, as well 
as calculating aggregate execution wait and run time statis- 
tics, in order to understand the impact of queue wait time on 
various systems. We will then investigate Grid-based (but 
possibly nonstandard) methods to submit chained jobs on 
the resources at the providers that are the most tolerant of 
AMP's computational workloads. 

7. CONCLUSIONS 

AMP has provided an opportunity to develop a new sci- 
ence gateway targeting TeraGrid computational resources. 
AMP's straightforward workflow provided an ideal project 
to explore the use of the Python-based Django web frame- 
work for rapid prototyping and development of a science 
gateway. Our separation of the web interface, processing 
daemon, and science components simplified the system's ar- 
chitecture and implementation. Furthermore, our use of 
common Django modules for both the web interface and 
the workflow daemon greatly reduced the complexity of im- 
plementation. The entire workflow was easily implemented 
using manual Globus command-line client calls to remote 
scripts and executables, further simplifying debugging and 
allowing AMP to be configured on remote resources without 
resource provider intervention. AMP is currently available 
for friendly user testing, and we anticipate the first extensive 
use of the system to perform new asteroseismology science 
using Kepler data in October 2009. In the future, we plan 
to examine possible applications of AMP's architecture and 
underlying technology choices to other NCAR science gate- 
way projects. 
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